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Abstract 

We find that the existence of self-organization of the members of a recently 
proposed minority game, depends on the type of update rules used. The 
resulting resource distribution is studied in some detail, and a related strategy 
scheme is considered, as a tool to improve the understanding of the model. 

PACS numbers: 05.65. +b, 02.50.Le, 64.75.+g, 87.23.Ge 



1 



Typeset using REVT^X 



The emergency of organization inside a population can be the result of local interactions 
between its members. This type of problems have been under study for a long time, and 
can be schematically reduced, for instance, to an Ising-like model. A problem that has 
recently become of interest is the self organization of a population without direct interactions 
between its members, but with a feedback mechanism related with its collective behavior. 
The minority game, introduced by Challet and Zhang PJ, addresses one of the simplest 
situations of this kind. In this model every member of a population has to choose from 
a simple alternative, without knowing what the other members will do. Simple cases are: 
to buy or sell in a stock market, to select one of two possible routes, etc. At the end of 
the day, the winners are those 'agents' that happen to be in the minority side. Feedback 
is established by a reward system for winners and losers. In more general terms, these 
problems are nothing but simple examples of a situation where there is a competition for a 
limited resource (money, food, free highways, etc.), and individual members of a population 
adapt their behavior following their (recent) experiences. Arthur was the first to propose 
this type of approach |2j , in what now is known as the El Farol bar problem. 

The specific form in which every member of the population makes his choise is generically 
designated as his 'strategy'. Different versions of the model are characterized by this strategy 
selection. In this work we will address the model proposed by Johnson et al. ||. As in all 
minority games, there is an odd number of agents N, every one choosing between option 
"0" (e.g. to buy an asset) and option "1" (to sell the asset). After all agents have made 
their choise, the winners, i.e. those in the minority group, gain a point, while those in the 
mayority group lose a point. A single binary digit, or 1, signals the winner option. Each 
agent knows beforehand the previous m outcomes of the game, as well as the outcomes of 
the most recent occurrences ('histories') of all 2 m possible bit strings of length m. Now, 
Johnson et al. assign to each agent a single number p ( < p < 1): given a history, the 
agent will either choose the same outcome as that stored in the memory, with probability 
p, or will choose the opposite with probability (1 —p). Strategies can be modified, following 
the evolution of the game. Thus, if an agent's account is below a threshold value d < , 
he gets a new strategy, whose value p' is chosen with an equal probability from the interval 
(p — r/2,p + r/2), where < r < 2; in what follows we will use the simpler notation 
p — > p' = p ± Ap. Simultaneously (and to some extent, arbitrarily), his account is reset to 
zero. As we will discuss below, the existence of negative points, combined with the behavior 
at the threshold, introduce some confusion at the time of considering the resources. In the 
following, we will refer to this combinations as the <i-rule. 

The work of Johnson et al. has shown that, as a result of the correlations established by 
those rules, agents self-organize, in such a way that the frequency distribution P(p) becomes 
strongly peaked around both p ~ and p ~ 1 (see curve Rl in Fig. |l|). 

Interesting as it is, this work leaves open some questions. First, it would be interesting to 
check the robustness of self-organization, under changes in the strategy actualization rules. 
As we will see, it also is of interest in this case to study with some detail the question of the 
resulting distribution of the resources. 

Our notation is as follows. A single realization of the game involves nt time steps. 
All results are then averaged over n s different samples. The total amount of points to be 
distributed in this process is T = N * n t * n s . We call n + {nJ) the number of positive 
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(negative) points, i.e. those assigned to winner (loser) agents. Obviously, T = n + + n_. 
There is also certain amount of points, Ni ost , that are eliminated from the game, namely 
those assigned to any agent changing his strategy, p — > p' . After all n s games are played, 
there will be N acc accumulated points, resulting simply from the sum of all accounts, at 
the end of every game. Note that, in general, T > Ni ost + N acc , because there are both 
positive and negative contributions to the accounts. Whenever necessary, we used reflective 
boundary conditions. 

We have made extensive numerical simulations, both with p —>■ p' = p ± Ap, the original 
rule (= Rl), and also with p — > p' — (1 —p) ±Ap (rule -R2), a seemingly minor modification 
of the actualization rule for the strategies. In the latter case, any loosing agent will change 
his mind and pick a 'complementary' strategy; in other words, if the initial selection was, 
say, to choose preferentially option "0" then, after losing, the agent will rather prefer option 
"1". The resulting distribution functions are shown in Fig.p]. Our results for Rl reproduce 
(without noise) those of Johnson et al. It is apparent that there are important differences 
between both cases: while self-organization shows up very clearly for Rl, it is practically 
absent in R2, which remains near to its initial (homogeneous) distribution Q. This result 
shows, apparently, that the presence of self-organization itself depends on the kind of strategy 
employed. 

Consider now the question of the distribution of available resources ( i.e. points). We 
have already mentioned that every agent losing more than d points gets his account reset to 
zero, whereby all those points leave the game. This introduces some confusion at the time 
to interpret our results. The standard interpretation [|TJ only takes into account positive 
points. I f we ignore for a moment the <i-rule, ans simply consider all positive points added, 
we found that the accumulated earning per time step, n + /T, is ~ 0.47 in both cases, within 
a small error. Notice that this is very near (N — 1)/2N = 0.495, the maximum possible gain 
with N = 101. The behavior of both cases, however, is also very different in this regard. 
In fact, the distribution of positive points earned by an agent, C + (p), follows closely the 
form of the corresponding P(p) shown in Fig.|l], i.e. while earning is concentrated around 
the extrema for Rl, it is distributed for R2. 

On the other hand, application of the ci-rule modifies sensibly this interpretation. The 
number of points earned by an agent with strategy p, C(p), is the algebraic sum of both 
positive and negative points. Every time C(p) < d, all these points (positive and negative) 
are discarded. At the end of all games, it is natural to choose the ratio G = N acc /T as the 
magnitude of interest. We find that G is always positive, but vanishes as T — > oo. In fact, 
Nacc is proportional to both N and n s , and our results imply G ~ l/n t . Thus, for instance, 
for n s = 500, and n t = 10 6 , it already is G ~ 0.00002. In other words, although there is 
self-organization, as described by the work of Johnson et al, the net resources distributed 
between all agents are vanishingly small. It is worth mentioning that this is not the result 
of adding negative and positive accounts: in the mean they are mostly positive. Simply 
enough, C(p) follows closely the behavior of P(p), but its magnitude vanishes in this limit. 
A good amount of the points involved in the game turn out to be in Ni ost and, what is more 
important, N[ ost ~ n_ — n+. This can be described by telling that in this case there are no 
winners in the game [0]. This situation is still more pronounced in case one uses R2, the 
alternative strategy rule. 

In fact, it is only because the accounts are adjusted periodically to zero, that they appear 



3 



to have mostly positive balance at the end of the games. In their study of this model, 
D'hulst and Rodgers || used the Hamming distance between strategies and concluded that, 
on average, the number of points earned by an agent, C(p,t), evolves with time t following 

C f (p,t) = -(i-r(p)) (t-t ) (1) 

until C < d, at which point is set equal to zero. In the above expression r(p) < 1/2 
and to are constants. This is a sawtooth function of t that is always negative, or vanishes. 
This analysis, however, does not takes into account properly the role of the d-rule, as can 
be seen in the following example. Consider a game where the winner is always the same 
agent, while all others lose. After L time steps, the winner agent will have L points, while 
the remaining (N — 1) agents will have — L each. Whenever — L < d, the d-rule implies that 
only the winner keeps his points. Therefore, after a while, the net amount of points of all 
players is necessarily positive. 

We now turn our attention to a related type of strategy. Our main concern here is to 
understand how we can improve the resource distribution, using rules similar to those of 
Ref. i. 

Consider a rule p — > p' which is intrinsically asymmetric (rule R3). In this case, 

p' = p ± Ap (2) 

where < po < 1 is constant. 

Application of will move agents to the neihborhood of Pq. Eventually, however, 
there will be a majority of agents in this place, and therefore all others players will win, 
establishing a stationary state. We want to know the dependence of G on p , Ap and r. 
Note that it is possible to describe some cases p — > p' as a superposition of situations with 
this type of update scheme. Figure |2](a) illustrates the case po = 0.8, r = 0.2. The resulting 
frequency distribution is asymmetric. In this case there are winners, namely those agents 
that manage to have their strategy below 0.5 The left side of Fig.|2](a) shows the gain as a 
function of p. In Fig.§(&), on the other hand, we have the gain G (i.e. the integral of that 
shown in (a)) as a function of po, for a fixed value of r. 

ALso, and rather unexpectectly, we can see in Fig|| that G is almost independent of 
r, until it approaches r ~ 1, where there is something analogous to a 'phase transition'; it 
probably corresponds to the 'transition' between localization around p , and derealization. 
It should be emphasized that these results are associated with the use of the d-rule. Within 
the standard interpretation, it can be seen that the gain increases with r, at least for r 
smaller than « 1. 

Finally, it should be pointed out that, although the present version shares the main ideas 
of the original minority game [|IJ, in some respects it does not follows the same behavior. 
Recent work [|7]] study the model of Callet and Zhang in terms of the variable a = P/N 
(in our case, P = 2 m ), and the variance of the time series, a 2 =< (n_ — n + ) 2 > or, more 
especifically, the reduced variance z = a 2 /N. As it is well known, the random agent case 
is given by o 2 = N, i.e. z = 1. This value is attained for a r ~ 0.2. Smaller values of a 
produce a worst-than-random answer (z > 1), while the game output improves if a > a r 
(z < 1). Moreover, they have identified two 'phases', characterized by the behavior of 
z = z(a). For a < a c , the reduced variance decreases with a, but for a > a c it becomes an 



4 



increasing function of a. Numerical simulations give the critical value a c ~ 0.34 A theoretical 
description of this game has been developed 0, based on an analogy with spin glasses. In 
any case, it is apparent that, for fixed N, the response of the game is strongly dependent 
with P (i.e. m in our case). We have not completed a systematic study in this respect, but 
it is rather clear that the present model is, in fact, almost totally independent of the actual 
value of m. On the other hand, we find z ~ 0.04 — 0.08 for our three upgrade rules (although 
a = 2 3 /101 ~ 0.08). This illustrates an important difference between both formulations. 

This work was partially supported by EC Grant ARG/B7-301 1/94/27, Contract 931005 
AR. 
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FIGURES 



FIG. 1. Frequency distribution function for two different strategy actualization rules. Rl: 
p — > p' = p± Ap; R2: p — > p' = (1 — p) ± Ap . In both cases N = 101, nt = 10 5 , n s = 10 4 , d = —4, 
m = 3, r = 0.2. 

FIG. 2. Strategy rule p ^ p' = p ± Ap, N = 101, n t = 10 5 , n s = 10 4 , r = 0.2, d = -4, m = 3. 
The line in (6) is only a guide to the eyes 

FIG. 3. Strategy rule p — > p' = po ± Ap. rij = 10 5 , n s = 500, N, d, m have the same values as 
in Fig.|2[ Continuous lines are only a guide to the eyes. 
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